91 research outputs found

    Segmentación temporal de secuencias de vídeo

    Full text link
    La aportación prevista se ha centrado en dos frentes. En primer lugar, desarrollar una metodología para evaluar la eficacia relativa de las diversas técnicas de comparación entre cuadros de una secuencia, tanto si se aplican aisladamente como combinadas; para ello, se ha introducido en este campo de investigación una herramienta estadística, el cálculo de divergencias. En segundo lugar, plantear un modelo de espacio de dicisión válido para la detección de cualquier tipo de cambio de toma (abruptos o graduales), así como una metodología para diseñar dicho espacio de decisión de modo que se tengan ciertas garantías sobre su eficacia. En cuanto a la metodología de comparación entre funciones de disparidad (funciones que indican el grado de similitud existente entre dos cuadros de una secuencia de vídeo), encaminada a contrastar la mayor o menor capacidad de estas funciones para ser utilizadas en un proceso de deteción automática de cambios de toma, la principal contribución reside en la potencia del método presentado. Esta potencia se debe tanto a la posibilidad de comparar técnicas basadas en el uso simultáneo de dos o más funciones de disparidad, como a la forma de los resultados de dicha comparación: la eficacia de cada técnica se evalúa a través de un proceso que culmina en la obtención de un único valor numérico, lo cual permite abordar un estudio amplio y al mismo tiempo un análisis claro y rápido de los resultados. En cuanto a espacios de decisión, se aporta un modelo unificado de espacio de decisión orientado a la detección de cambios de toma tanto abruptos como graduales (las técnicas publicadas hasta la fecha se orientan bien a la detección de cambios abruptos, bien a la de graduales o incluso a cada tipo de cambio gradual, y en cualquier caso sin establecer un modelo explícito). El modelo considera la existencia de dos subespacios. Uno está formado por elementos que resultan de la aplicación de una o varias funciones de disparidad a los cuadros de una secuencia de vídeo; dichos elementos, por el hecho de proceder de una secuencia de vídeo, mantienen una relación de orden, es decir, forman una serie de valores de disparidad. El otro subespacio está formado por elementos que resultan de la aplicación de una o varias funciones de filtrado sobre la serie de valores de disparidad del primer subespacio; estas funciones tienen como fin resaltar los patrones temporales que el cambio de toma graduales sino también en la de cambios abruptos, en la que permite abordar de un modo formalizado el modelado y eliminación de anomalías (flashes, defectos de emisión, etc.)

    Accurate segmentation and registration of skin lesion images to evaluate lesion change

    Full text link
    Skin cancer is a major health problem. There are several techniques to help diagnose skin lesions from a captured image. Computer-aided diagnosis (CAD) systems operate on single images of skin lesions, extracting lesion features to further classify them and help the specialists. Accurate feature extraction, which later on depends on precise lesion segmentation, is key for the performance of these systems. In this paper, we present a skin lesion segmentation algorithm based on a novel adaptation of superpixels techniques and achieve the best reported results for the ISIC 2017 challenge dataset. Additionally, CAD systems have paid little attention to a critical criterion in skin lesion diagnosis: the lesion's evolution. This requires operating on two or more images of the same lesion, captured at different times but with a comparable scale, orientation, and point of view; in other words, an image registration process should first be performed. We also propose in this work, an image registration approach that outperforms top image registration techniques. Combined with the proposed lesion segmentation algorithm, this allows for the accurate extraction of features to assess the evolution of the lesion. We present a case study with the lesion-size feature, paving the way for the development of automatic systems to easily evaluate skin lesion evolutionThis work was supported in part by the Spanish Government (HAVideo, TEC2014-53176-R) and in part by the TEC department (Universidad Autonoma de Madrid

    A corpus for benchmarking of people detection algorithms

    Full text link
    This is the author’s version of a work that was accepted for publication in Pattern Recognition Letters. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition Letters, 33, 2 (2012) DOI: 10.1016/j.patrec.2011.09.038This paper describes a corpus, dataset and associated ground-truth, for the evaluation of people detection algorithms in surveillance video scenarios, along with the design procedure followed to generate it. Sequences from scenes with different levels of complexity have been manually annotated. Each person present at a scene has been labeled frame by frame, in order to automatically obtain a people detection ground-truth for each sequence. Sequences have been classified into different complexity categories depending on critical factors that typically affect the behavior of detection algorithms. The resulting corpus, which exceeds other public pedestrian datasets in the amount of video sequences and its complexity variability, is freely available for benchmarking and research purposes under a license agreement.This work has been partially supported by the Cátedra UAM-Infoglobal (“Nuevas tecnologías de vídeo aplicadas a sistemas de video-seguridad”), by the Ministerio de Ciencia e Innovación of the Spanish Goverment (TEC2011-25995 EventVideo: “Estrategias de segmentación, detección y seguimientos de objetos en entornos complejos para la detección de eventos en videovigilancia y monitorización”) and by the Universidad Autónoma de Madrid (“FPI-UAM: Programa propio de ayudas para la Formación de Personal Investigador”)

    Automatic semantic parsing of the ground-plane in scenarios recorded with multiple moving cameras

    Full text link
    Nowadays, video surveillance scenarios usually rely on manually annotated focus areas to constrain automatic video analysis tasks. Whereas manual annotation simplifies several stages of the analysis, its use hinders the scalability of the developed solutions and might induce operational problems in scenarios recorded with Multiple and Moving Cameras (MMC). To tackle these problems, an automatic method for the cooperative extraction of Areas of Interest (AoIs) is proposed. Each captured frame is segmented into regions with semantic roles using a stateof- the-art method. Semantic evidences from different junctures, cameras and points-of-view are then spatio-temporally aligned on a common ground plane. Experimental results on widely-used datasets recorded with multiple but static cameras suggest that this process provides broader and more accurate AoIs than those manually defined in the datasets. Moreover, the proposed method naturally determines the projection of obstacles and functional objects in the scene, paving the road towards systems focused on the automatic analysis of human behaviour. To our knowledge, this is the first study dealing with this problematic, as evidenced by the lack of publicly available MMC benchmarks. To also cope with this issue, we provide a new MMC dataset with associated semantic scene annotationsThis study has been partially supported by the Spanish Government through its TEC2014-53176-R HAVideo projec

    Crop classification based on temporal signatures of Sentinel-1 observations over Navarre province, Spain

    Get PDF
    Crop classification provides relevant information for crop management, food security assurance and agricultural policy design. The availability of Sentinel-1 image time series, with a very short revisit time and high spatial resolution, has great potential for crop classification in regions with pervasive cloud cover. Dense image time series enable the implementation of supervised crop classification schemes based on the comparison of the time series of the element to classify with the temporal signatures of the considered crops. The main objective of this study is to investigate the performance of a supervised crop classification approach based on crop temporal signatures obtained from Sentinel-1 time series in a challenging case study with a large number of crops and a high heterogeneity in terms of agro-climatic conditions and field sizes. The case study considered a large dataset on the Spanish province of Navarre in the framework of the verification of Common Agricultural Policy (CAP) subsidies. Navarre presents a large agro-climatic diversity with persistent cloud cover areas, and therefore, the technique was implemented both at the provincial and regional scale. In total, 14 crop classes were considered, including different winter crops, summer crops, permanent crops and fallow. Classification results varied depending on the set of input features considered, obtaining Overall Accuracies higher than 70% when the three (VH, VV and VH/VV) channels were used as the input. Crops exhibiting singularities in their temporal signatures were more easily identified, with barley, rice, corn and wheat achieving F1-scores above 75%. The size of fields severely affected classification performance, with ~14% better classification performance for larger fields (>1 ha) in comparison to smaller fields (<0.5 ha). Results improved when agro-climatic diversity was taken into account through regional stratification. It was observed that regions with a higher diversity of crop types, management techniques and a larger proportion of fallow fields obtained lower accuracies. The approach is simple and can be easily implemented operationally to aid CAP inspection procedures or for other purposes. © 2020 by the authors.This work was supported by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund (MINECO/FEDER-UE) through a project (CGL2016-75217-R) and a grant (BES-2017-080560). It was also partly founded by project PyrenEOS EFA 048/15, that has been 65% cofinanced by the European Regional Development Fund (ERDF) through the Interreg V-A Spain-France-Andorra programme (POCTEFA 2014-2020)

    DiVA: A Distributed Video Analysis framework applied to video-surveillance systems

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. C. San Miguel, J. Bescós, J. M. Martónez, and Á. García, "DiVA: A Distributed Video Analysis Framework Applied to Video-Surveillance Systems", in WIAMIS '08. Ninth International Workshop on Image Analysis for Multimedia Interactive Services, 2008, Klagenfurt (Germay), 2008, pp. 207 - 210.This paper describes a generic, scalable, and distributed framework for real-time video-analysis intended for research, prototyping and services deployment purposes. The architecture considers multiple cameras and is based on a server/client model. The information generated by each analysis module and the context information are made accessible to the whole system by using a database system. System modules can be interconnected in several ways, thus achieving flexibility. Two main design criteria have been low computational cost and easy component integration. The experimental results show the potential use of this system.This work is supported by Cátedra Infoglobal-UAM for “Nuevas Tecnologías de video aplicadas a la seguridad”, by the Spanish Government (TEC2007-65400 SemanticVideo), by the Comunidad de Madrid (S-050/TIC-0223 - ProMultiDis-CM), by the Consejería de Educación of the Comunidad de Madrid and by The European Social Fund

    Spacecraft Pose Estimation Based on Unsupervised Domain Adaptation and on a 3D-Guided Loss Combination

    Full text link
    Spacecraft pose estimation is a key task to enable space missions in which two spacecrafts must navigate around each other. Current state-of-the-art algorithms for pose estimation employ data-driven techniques. However, there is an absence of real training data for spacecraft imaged in space conditions due to the costs and difficulties associated with the space environment. This has motivated the introduction of 3D data simulators, solving the issue of data availability but introducing a large gap between the training (source) and test (target) domains. We explore a method that incorporates 3D structure into the spacecraft pose estimation pipeline to provide robustness to intensity domain shift and we present an algorithm for unsupervised domain adaptation with robust pseudo-labelling. Our solution has ranked second in the two categories of the 2021 Pose Estimation Challenge organised by the European Space Agency and the Stanford University, achieving the lowest average error over the two categories.Comment: Accepted at ECCV 2022 AI4SPACE Workshop (https://aiforspace.github.io/2022/

    Semantic-Aware Scene Recognition

    Full text link
    Scene recognition is currently one of the top-challenging research fields in computer vision. This may be due to the ambiguity between classes: images of several scene classes may share similar objects, which causes confusion among them. The problem is aggravated when images of a particular scene class are notably different. Convolutional Neural Networks (CNNs) have significantly boosted performance in scene recognition, albeit it is still far below from other recognition tasks (e.g., object or image recognition). In this paper, we describe a novel approach for scene recognition based on an end-to-end multi-modal CNN that combines image and context information by means of an attention module. Context information, in the shape of semantic segmentation, is used to gate features extracted from the RGB image by leveraging on information encoded in the semantic representation: the set of scene objects and stuff, and their relative locations. This gating process reinforces the learning of indicative scene content and enhances scene disambiguation by refocusing the receptive fields of the CNN towards them. Experimental results on four publicly available datasets show that the proposed approach outperforms every other state-of-the-art method while significantly reducing the number of network parameters. All the code and data used along this paper is available at https://github.com/vpulab/Semantic-Aware-Scene-RecognitionComment: Paper submitted for publication to Elsevier Pattern Recognition journa
    corecore